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EVALUATION 


The  objectives  of  this  effort  were  to  implement  an  experimental  data 
repository  and  provide  information  processing  tools  to  assist  the 
in-house  software  reliability  modeling  program.  This  effort  was 
initiated  in  response  to  an  in-house  requirement  for  a computerized 
database  management  capability  for  software  error  data.  Sizeable 
collections  of  software  error  data  had  been  acquired  from  several 
large  software  development  projects  for  the  in-house  program. 

This  effort  satisfactorily  addressed  all  major  program  objectives.  The 
Baseline  Software  Data  System  (BSDS)  was  successfully  implemented  on 
the  RADC  HIS  6180  computer  system.  Capabilities  are  available  for 
defining,  loading,  updating  and  querying  databases.  The  BSDS  also 
provides  capabilities  for  producing  reports,  generating  data  subsets, 
and  interfacing  with  application  programs. 

In  addition  to  the  software  error  database,  a summary  database  and  a 
software  productivity  database  were  also  implemented.  The  BSDS  is 
currently  being  maintained  by  the  Data  and  Analysis  Center  for  Software 
(DnCS)  and  will  be  expanded  as  more  data  becomes  available. 


This  effort  falls  within  the  goals  of  the  RADC  Technology  Plan, 
specifically  TPO-5,  C-3  System  Availability  (Hardware/Software) , in 
subthrust  Software  Cost  Reduction  (Software  Data  Collection  and 
Analysis) . 
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Section  I 
INTRODUCTION 


1 . 1 Study  Objectives  and  Scope 

The  objectives  of  this  study  effort  (Contract  Number  F30602- 
77-C-0052)  were  to  provide  RADC  in-house  research  efforts  with 
easy  to  use  information  processing  tools  to  assist  in  their 
software  reliability  modeling  efforts  and  to  implement  an  experi- 
mental data  repository  to  serve  as  a test  bed  for  study  and  analysis 
of  potential  problems  and  solutions  for  the  establishment  and 
operation  of  the  Data  S Analysis  Center  for  Software  (DACS) . The 
purpose  of  the  DACS  is  to  upgrade  the  software  development  process 
through  the  collection,  analysis,  and  dissemination  of  software 
development  experience  information.  The  results  of  the  study  to 
develop  the  design  for  the  center  are  reported  in  RADC-TR- 76- 387 , 
Software  Data  Repository  Study  f reference  1). 

RADC  had  previously  acquired  software  error  data  from  six 
large  software  development  projects  as  reported  in  references 
7 through  12.  The  data  from  these  datasets  were  implemented  as 
the  Historical  Database  on  the  Honeywell  6180  Computer  System  et 
RADC  using  the  General  Comprehensive  Operating  Supervisor  (GCOS) 
and  the  Management  Data  Query  System  (MDQS) . These  datasets  were 
analyzed  in  terms  of  data  content  and  compared  to  the  data  re- 
quirements for  software  reliability  modelling  studies.  Also,  the 
data  from  these  datasets  were  summarized  along  with  information 
from  the  Final  Reports  to  form  the  Summary  Database. 

1 . 2 Report  Contents 

This  volume.  Volume  I,  provides  in  Section  II  a feature 
evaluation  of  the  MDQS  which  was  the  database  management  software 
used  for  the  implementation  of  the  Baseline  Software  System. 

Section  III  contains  an  introductory  discussion  on  each  database 
and  a summary  of  the  evaluation  of  data  requirements  for  software 
reliability  models. 

Volume  II  provides  the  user  of  the  Baseline  Software  Data 
System  with  instructions  for  defining  and  retrieving  data  from 
the  databases  using  MDQS. 
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Section  II 


MDQS  FEATURE  EVALUATION 


The  purpose  of  this  section  is  to  provide  a feature  evaluation 
of  the  MDQS  which  was  used  as  the  database  management  software  for 
the  implementation  of  the  Baseline  Software  Data  System.  In  this 
section,  references  are  made  to  the  applicable  MDQS  Manual  and  the 
page  number  that  describes  the  feature  in  the  form  (report  refer- 
ence number,  manual  page  number).  Not  all  of  the  features  of  MDQS 
are  discussed  but  only  those  that  seem  most  important  and  had 
been  previously  defined  as  a database  management  requirement  for 
the  Software  Data  Repository  (reference  1).  i 


Included  in  this  section  is  an  overview  of  MDQS,  a discussion 
on  the  database  management  tools  provided  for  each  user  type  and 
the  characterization  and  structure  of  the  database,  a presentation 
on  the  MDQS  capabilities  for  loading,  maintaining,  and  retrieving 
data,  a discussion  on  MDQS  data  security  aspects,  and  the  conclu- 
sions and  recommendations  of  this  evaluation  effort. 


MDQS  Overview 


MDQS  is  the  Honeywell  commercial  offering  of  the  World  Wide 
Data  Management  System  (WWDMS)  developed  for  the  World  Wide 
Military  Control  and  Command  System  (WWMCCS)  and  is  a sub-system 
of  the  GCOS  Operating  System  using  both  the  time-sharing  and  batch 
environments.  During  this  effort  two  versions  of  MDQS  (designated 
System/IV  (MDQS/IV))  were  tested  at  the  RADC  Computer  Center 
including : 


MDQS 

Version 

GCOS  Version 

Manual  Reference 
Number 

MD 

2.0 

1G.3 

2 and  3 

MD 

2.2 

2H.2 

4 and  5 

MDQS  is  a comprehensive  database  management  system  which 
provides  the  capabilities  for  database  definition,  creation,  re- 
trieval, maintenance,  restructuring,  and  report  generation  and 
operates  in  both  the  online  and  batch  environments . The  term 
online  is  used  here  to  denote  the  appearance  to  the  user  rather 
than  the  internal  operational  mode.  The  definitions  are  per- 
formed in  the  batch  environment  but  the  job  control  language  can 
be  generated  interactively  online.  There  are  capabilities  to 
perform  retrievals  and  maintenance  in  batch,  online/batch,  or 
online.  The  online  capability  is  offered  through  the  use  of  the 


Conversational  Management  Data  Query  (CMDQ)  which  allows  a user 
to  interactively  generate  and  execute  a procedure  from  the 
terminal  (5 , 7-1)  . 

2 . 2 Users 

MDQS  provides  database  management  tools  for  the  database 
administrator,  the  applications  programmer,  the  nonprogrammer , 
and  the  parametric  user.  Facilities  are  provided  to  the  data- 
base administrator  to  define,  create,  maintain  databases  and  to 
establish  file  protection  (all  of  reference  4) . 

Application  programmers  are  computer  professionals  who  are 
versed  in  the  current  practives  of  data  processing.  MDOS 
provides  them  tools  for  writing  data  subsets  and  interfacing  to 
application  programs,  for  processing  difficult  queries,  and 
for  generating  reports  (all  of  reference  5).  A nonprogrammer 
(or  general  user)  is  typically  a person  who  is  knowledgeable  in 
the  functions  of  an  organization  but  is  not  necessarily  a compu- 
ter professional.  For  this  effort  it  is  assumed  the  "nonprogram- 
mer” is  familiar  with  the  software  engineering  field  but  does 
not  know  the  structure  of  the  database.  The  nonprogrammer  can 
utilize  some  of  the  basic  procedure  and  query  language  features 
to  retrieve  data  and  write  simple  reports  (5,  2-12,  and  5,  8-1). 
CMDQ  can  also  be  used  by  a nonprogrammer  to  interactively  generate 
and  execute  simple  procedures  (5,  7-1).  Parametric  users  are 
support  personnel  who  do  not  have  programming  skills  but  do  have 
the  knowledge  required  to  invoke  predefined  transactions.  Facili- 
ties for  the  parametric  user  are  provided  by  the  capability  to 
generate  procedures  where  parameters  are  input  at  exectution 
time (5 , 3-26) . 

2 . 3 Database  Characterization  and  Structure 

There  are  three  MDQS  databases  in  the  Baseline  Software  Data 
System:  the  Historical  Database,  the  Summary  Database,  and  the 

RADC  Productivity  Database. 

The  Historical  Database  consists  of  six  sequential  datasets 
containing  a total  of  31,912  eighty-four  character  records.  Below 
is  a summary  of  the  characteristics  of  each  dataset. 


Dataset  Number  of  Number  of  Number  of 

Number  Records  Data  Items  Record  Types 
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The  Summary  Database  is  an  indexed- sequent ial  database  con- 
taining nine  entries  (record  types)  and  135  data  items.  Each 
entry  contains  a key  field  which  is  used  to  uniquely  identify 
each  record  occurrence.  The  maximum  size  of  the  database  is 
approximately  seven  million  characters.  (See  Table  2-1  for  a 
break  out  of  the  size  for  each  entry  for  each  project  in  the 
Historical  Database) . 

The  RADC  Productivity  Database  is  a sequential  database  con- 
taining 1200  eighty- four-character  records  consisting  of  three 
entires  and  31  data  items. 

A description  of  the  contents  of  each  of  these  databases  is 
provided  in  Volume  II  and  in  Section  III  of  this  volume. 

These  databases  were  defined  using  the  three  MDQS  definition 
languages  (Directory,  Data,  and  Applicaiton) . The  Directory 
Definition  Language  defines  the  name  of  the  database  and  the  perm- 
file  names  of  the  files  associated  with  the  database  (4,  3-1).  The 
Data  Definition  Language  is  a COBOL- like  description  language 
which  describes  attributes  (length,  data  type,  etc.)  of  the  data 
items  and  the  structure  of  the  database.  The  Data  Definition 
constitutes  the  schema  (4,  4-1). 

Sub-schemas  are  defined  using  the  Application  Definition 
Language  which  is  the  user's  view  of  the  data.  This  language 
defines  all  of  the  databases  that  are  to  be  accessed  by  an  MDQS 
procedure  (4,  5-1). 

Values  of  data  items  can  be  decoded  using  the  Table- 
Lookup  option  in  the  Application  Definition  Language  (4,  5-19) 
or  in  the  Procedure  Language  (5,  C-30)  . The  tables  can  be 
generated  using  the  PERFORM  subsystem  (5,  C-25) . The  ENCODING/ 
DECODING  clause  within  the  data  definition  can  be  used  to 
specify  a user  subroutine  that  is  to  be  executed  whenever  a data 
item  requiring  special  conversion  is  to  be  processed  or  updated 
by  a procedure  (4,  4-21). 

The  database  directory  is  available  for  display  for  an 
application  definition  by  use  of  the  Application  Definition  File 
Query  (ADFQ)  subsytem  (5,  6-1).  This  capability  allows  for  the 
listing  of  data-item  name,  type,  and  number  of  characters  for 
each  entry  within  an  application  definition. 

Singular,  hierarchical,  and  network  are  the  three  allow- 
able MDQS  data  structures  (4,  1-7).  The  singular  data  structure 
consists  of  only  one  type  of  element  with  no  dominant  or  sub- 
ordinate relationships  while  the  hierarchical  data  structure 
consists  of  elements  that  can  be  related  to  any  number  of  lower 
level  elements  but  only  one  higher  level  element.  The  network 
data  structure  consists  of  elements  that  can  be  related  to  any 


5 


i gj 


TABLE  2-1.  SUMMARY  DATABASE  SIZE 


number  of  lower  level  elements  and  any  number  of  higher  level 
elements.  Figure  2-1  contains  a pictorial  representation  of  the 
three  data  structures.  The  data  structure  represents  the  logical 
view  of  the  data. 

The  allowable  file  organizations  (storage  structures)  for 
MDQS  are  sequential,  indexed  sequential,  and  integrated.  For  a 
sequential  file  organization  the  records  are  stored  serially  and 
the  only  way  of  physically  accessing  a record  is  to  read  all  records 
that  precede  it,  beginning  with  the  first  record  in  the  file. 

An  indexed  sequential  file  is  a collection  of  records  that 
can  be  accessed  either  sequentially  in  key  value  order  or  randomly 
by  a particular  key  value.  It  consists  of  a data  file  and  an  index 
file.  An  integrated  file  is  a collection  of  records  that  may 
contain  complex  inter-record  relationships  where  the  record 
association  is  achieved  through  chains  which  provide  cross-refer- 
ence linkages  between  records.  The  allowable  data  structures  for 
each  file  organization  are  illustrated  in  Table  2-2  (4,  1-7). 

The  integrated  file  structure  is  effected  in  MDQS  by  the  use 
of  Integrated  Data  Store  (I-D-S)  (references  13  and  13);  and 
indexed  sequential  file  by  the  use  of  the  Indexed  Sequential 
Processor  (reference  14).  These  two  file  structures  were  studied 
to  determine  the  feasibility  of  use  for  the  Summary  Database.  It 
was  determined  that  an  indexed  sequential  file  organization  was 
the  most  effective  means  for  implementing  the  Summary  Database. 

When  using  an  integrated  file  structure,  the  data  definitions  and 
query  procedures  become  complex  because  of  the  need  to  define 
chains,  retrieval  mechanisms,  and  physical  storage  requirements 
(4,  4-34).  By  defining  unique  keys  in  the  indexed  sequential  file 
for  each  record  occurrence,  a relational  system  was  being  effected. 
This  then  provides  more  flexibility  to  expand  the  definitions  and 
to  transfer  to  another  data  management  system,  if  requirements 
dictate . 


TABLE  2-2.  DATA  STRUCTURES/FILE  ORGANIZATIONS 


Data  Structures 

File  Organization 

Sequential 

Indexed 

Sequential 

Integrated 

Singular 

X 

X 

X 

Hierarehal 

X 

X 

X 

Network 

X 

The  RADC  Productivity  Database,  the  transaction  files  for 
the  Summary  Database,  and  the  six  datasets  for  the  Historical 
Database  are  defined  as  sequential  files  with  singular  data  struc- 
tures. The  Summary  Database  is  defined  as  an  indexed  sequential 
file  and  a hierarehal  data  structure. 
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2 . 4 Data  Loading  and  Maintenance 

There  are  various  options  for  initially  loading  a database 
dependent  upon  file  structure.  The  data  can  be  loaded  external 
to  MDQS  through  the  use  of  system  utilities,  HOL  programs  using 
the  standard  I/O  Routines,  the  Indexed  Sequential  Processor, 
(reference  15)  or  Integrated-Data-Store , (reference  13,  14)  and 
must  follow  the  standards  for  the  specific  file  structure  (4,  2-8). 
The  Historical  and  Productivity  Databases  were  loaded  using  utili- 
ties (see  reference  6,  Appendix  B) . The  Summary  Database  was 
loaded  using  a combination  of  the  Indexed  Sequential  Processor, 
Fortran  programs,  and  the  MDQS  LOAD  function. 

Within  MDQS  the  self-contained  capability  of  the  data  can  be 
loaded  using  the  LOAD  function  of  the  Conversational  MDQS  Language 
(CMDQ)  Subsystem  (5,  7-200).  The  LOAD  function  is  used  to  generate 
a new  sequential  or  indexed  sequential  entry  from  a terminal  using 
a prompting  method  for  inputting  data  item  values. 

The  READ  statement  of  the  Procedure  Language  (5,  5-103)  causes 
data  to  be  read  from  a non-database  file  into  a specified  structure 
and  can  be  read  from  a permfile  on  a removable  device  or  a magnetic 
tape . 


Updating  is  performed  (except  for  sequential)  by  the  use  of 
the  UPDATE  function  within  CMDQ  (5,  7-11),  by  the  use  of  the 
UPDATE  statement  of  the  Procedure  Language  (5,  5-149),  and  by  the 
use  of  the  UPDATE  clause  in  the  RETRIEVE  Statement  of  the  Procedure 
Language  (5,  5-131).  These  are  used  in  conjunction  with  other 
statements  of  the  Procedure  Language  including  DELETE  (5,  5-48), 
INSERT  (5,  5-63),  STORE  (5,  5-146),  and  RESTORE  (5,  5-126).  There 
are  restrictions  on  the  use  of  these  capabilities  and  Appendix  F 
of  reference  5 provides  guidelines  for  using  these  functions 
dependent  upon  the  file  structure.  The  use  of  this  updating  feature 
requires  that  separate  transaction  files  be  initially  generated 
with  the  updated  data  and  then  updating  is  performed.  Figure  2-2 
illustrates  the  overall  flow  for  updating  the  indexed-sequential 
Summary  Database  with  a sequential  transaction  file. 

MDQS  does  not  provide  for  a Host  Language  capability  where 
an  application  program  can  directly  access  the  database  through 
the  use  of  a CALL  or  language  verb.  However,  if  the  database  is 
an  integrated  file  I-D-S  can  be  used,  if  the  database  is  index- 
sequential  the  index-sequential  processor  can  be  used,  and  if  the 
database  is  sequential  the  GCOS  file  system  can  be  used  which  is 
standard  for  all  the  GCOS  procedure  languages. 

The  Data  Directory  feature  in  MDQS  allows  for  the  listing  of 
the  attributes  of  data  items  and  entries  within  a database,  but 
does  not  provide  cross-reference  information  in  terms  of  relation- 
ships to  other  data  items  or  the  utilization  of  the  items. 
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Validity  checking  is  performed  by  the  use  of  the  CHECK/IS 
clause  in  the  data  definition  (4,  4-19)  and  is  executed  whenever 
a value  is  changed  or  added  to  the  data  item  during  a batch 
execution  by  procedure.  The  actual  checking  is  accomplished  by 
a user  subroutine  and/or  by  specifying  the  valid  PICTURE  clause 
and  a value  range. 

MDQS  provides  facilities  for  reorganizing  the  PICTURE 
and  USAGE  clauses  and  adding  or  deleting  groups,  data  items,  and 
records.  The  picture  changes  allowed  are  those  as  permitted  by 
a COBOL  MOVE  statement. 

New  Directory  Definitions  and  Data  Definitions  must  be 
translated  and  then  the  actual  restructuring  is  performed 
using  an  MDQS  utility  function  (4,  2-10)  4,  6-1).  Figure  2-3 
illustrates  the  basic  steps  needed  for  restructuring  a sequen- 
tial or  indexed-sequential  file.  The  new  and  old  data  definition 
source  code  is  used  as  input  to  an  MDQS  utility  routine  and  a 
COBOL  program  is  generated,  compiled,  and  executed  performing 
the  restructuring.  The  process  for  integrated  files  uses 
I-D-S  utility  programs. 

MDQS  provides  for  a checkpoint  and  restart  capability  for 
both  the  database  entry  that  is  being  used  during  a procedure 
and  the  coincident  memory  image  of  the  procedure  (5,  5-34  and 
5,  5-164).  The  frequency  of  checkpoints  can  be  specified  and 
a segment  of  a procedure  is  executed  through  the  use  of  the 
CHECKPOINT/ROLLBACK  statement.  The  capability  is  only  valid 
for  those  databases  which  have  concurrent  update  protection 
specified  in  the  Directory  Definition  and  the  SHARED  or  EX- 
CLUSIVE mode  in  the  procedure. 

MDQS  does  not  provide  for  the  capability  of  capturing  in- 
formation about  changes  made  to  the  database  and  usage  charac- 
teristics although  various  logging  facilities  and  sampling  tech- 
niques of  GCOS  can  be  utilized. 

2 . 5 Retrieval  and  Report  Generation 

Through  a self-contained  procedure  language,  the  MDQS  re- 
trieval and  report  generation  capability  provides  for  qualifying 
a subset  of  the  database,  sorting  and/or  formatting  this  subset, 
and  printing  this  subset  directly  to  the  requesting  computer 
terminal.  The  basic  retrieval  capability  is  accomplished  by 
the  use  of  the  INVOKE  and  RETRIEVE  statements  with  the  incorpora- 
tion of  a conditional  expression  which  qualifies  the  data  subset 
of  interest  (5,  5-65  and  5,  5-128).  The  SORT  statement  specifies 
the  order  of  the  sort  according  to  a maximum  of  50  key  fields 
(5.  5-141). 

MDQS  procedures  may  reference  user  application  COBOL, 

Fortran  or  GMAP  programs  that  perform  data  validation,  encoding 
and  decoding,  table  lookups,  and  data  transformation  (5,  C-l). 
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In  addition,  the  results  of  a procedure  can  be  written  to  a 
system  standard  permfile  and  subsequently  utilized  by  an  applica 
tion  program.  The  results  of  a retrieval  can  also  be  output  to 
the  printer,  to  the  online  terminal,  to  a magnetic  tape,  or  to  a 
permfile  that  can  be  printed  on  the  terminal  (5,  5-81). 

A tutorial  method  for  generating  MDQS  procedures  is 
available  through  the  use  of  the  CMDQ  subsystem  (5,  7-1)  and 
a more  simplified  method  of  retrieving  data  than  the  standard 
procedure  language  is  through  the  use  of  the  Query  Procedure 
Language  (5,  8-1).  A capability  with  the  procedure  language 
allows  for  the  definition  of  parameters  to  be  inserted  at  ex- 
ecution time  (5,  3-26). 

An  extensive  reporting  capability  is  available  through 
the  use  of  the  REPORT,  LINE,  and  SPACE  STATEMENTS  (5,  2-27) 
and  through  the  use  of  various  editing  options  (5,  3-30). 

Multiple  users  can  access  an  MDQS  database  concurrently 
through  a concurrent  access  environment  which  protects  the  in- 
tegrity of  the  contents  of  the  files  and  prevents  interference 
between  multiple  users  (A,  2-10).  The  databases  must  initially 
be  established  with  concurrent  update  protection  by  using  the 
GCOS  File  Management  Supervisor  (FMS)  ACCESS/MONITOR  and  ABORT/ 
ROLLBACK  options.  The  database  access  is  then  defined  as 
PROTECTED  in  the  Directory  Definition  (4,  3-6). 

2 . 6 Security 

MDQS  uses  the  GCOS  File  Security  System  (FILSYS)  for  file 
security  and  provides  facilities  for  specifying  the  privacy 
protection  and  for  controlling  access  to  the  databases  by  MDQS 
procedures  (4,  1-2).  The  Data  Base  administrator  is  responsible 
for  assigning  locks  and  keys,  generating  a privacy  file,  and 
defining  the  locks  in  the  Data  Definition  (4,  7-1). 

The  Privacy  file  is  created  by  the  use  of  the  Privacy 
Command  within  the  Privacy  subsystem  and  establishes  correspond- 
ing locks  and  keys  for  User  IDs  (4,  7-7).  The  privacy  locks  at 
the  record  level  are  defined  in  the  record  complete  entry  of 
the  Data  Definition  where  the  lock(s)  supply  to  the  reading 
and  writing  by  an  MDQS  Procedure  for  all  data  items  within 
the  record  (4,  4-8).  The  locks  for  each  individual  data  item 
are  defined  in  the  group/ item  entry  of  the  Data  Definition 
(4,  4-22). 

2.7  Conclusions  and  Recommendations 


Overall  MDQS  provides  the  basic  database  management  feature 
necessary  for  the  implementation  of  a Data  and  Analysis  Center 
for  Software  (DACS) . The  three  most  powerful  features  of  MDQS 
are  its  report  production  capabilities,  data  structuring 
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alternatives,  and  the  database  administrator  tools  including 
the  schema- subschema  facility.  It  is  also  very  important  that 
during  this  effort  MDQS,  with  only  a few  exceptions,  performed 
its  functions  as  described  in  the  documentation.  The  weakest 
feature  of  MDQS  is  the  syntax  of  the  Procedure  Language  in  that 
it  is  somewhat  cumbersome  to  use  for  generating  complicated 
queries.  Also  MDQS  is  limited  in  the  tools  it  provides  for  data- 
base maintenance.  These  limitations  can  be  compensated  for 
through  the  use  of  GCOS  utilities  and  user  HOL  programs. 

It  is  recommended  that  MDQS  continue  to  be  used  as  the  data- 
base management  software  for  the  development  of  the  Baseline 
Software  Data  System  to  establish  the  framework  for  the  evolution 
into  a pilot  DACS  and  then  into  a fully  operational  center. 


Section  III 


DATABASE  DESCRIPTIONS  AND  DATA  REQUIREMENTS 


This  section  provides  an  introduction  to  the  Historical, 
Summary,  and  Productivity  Databases.  Also  included  in  this 
section  is  a summary  of  the  work  performed  during  this  effort  on 
the  evaluation  of  data  requirements  for  software  reliability 
models.  The  types  of  data  required  are  listed  in  Figure  3-1 
along  with  a short  description  of  each  data  item. 

3 . 1 Baseline  Databases 

3.1.1  Historical  Database.  The  Historical  Database  consists  of 
six  datasets  that  contain  problem  reporting  and  module  descriptive 
information  on  six  large  software  development  projects.  The 

data  items  available  for  each  dataset  are  indicated  in  Table  3-1 
using  as  a basis  the  data  items  listed  in  the  data  requirements 
list  (Figure  3-1).  There  are  two  columns  associated  with  each 
project.  The  first  column  provides  the  number  of  characters  that 
are  needed  to  represent  the  data  item,  and  the  second  column 
indicates  the  maximum  number  of  occurrences  for  each  problem 
recorded. 

Following  is  a short  description  of  the  six  projects  that 
constitute  the  data  for  the  Historical  Database. 

Project  1 - This  dataset  contains  Software  Problem  Reports  (SPR) 
from  a large  Command  and  Control  System  consisting  of 
115,346  Jovial/ J4  source  statements  and  249  program 
modules.  The  Project  itself  and  the  dataset  is  dis- 
cussed in  Reference  7 and  is  referred  to  as  Project  3. 

There  is  a total  of  4,970  Software  Problem  Report 
records  consisting  of  the  SPR  number,  the  date  opened  and 
closed,  the  module  which  manifested  the  error,  the  module 
that  was  changed,  the  error  category  and  the  severity  of 
the  error,  the  test  period,  the  correction  type,  and  the 
Software  Modification  Notice  (SMN)  number.  There  is  a 
record  occurrence  for  each  modification  made.  Every  SPR 
required  at  least  one  SMN,  and  one  SMN  could  have  closed 
more  than  one  SPR.  Therefore,  the  SPR  numbers  are  not 
unique  and  the  SMN  numbers  are  not  unique. 

Project  2 - This  dataset  contains  Software  Problem  Reports  and 
Module  descriptions  from  an  Avionics  System  consisting  of 
40,640  Jovial/ J3B  source  statements  and  84,065  Assembly 
Language  statements.  The  description  of  the  collection 
and  analysis  of  this  dataset  is  contained  in  Reference  8. 
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010  PROJ-ID 
020  PROJ-VERSION 
030  PROJ-TYPE 
0*0  srs-iD 
050  SYS-VERSION 
060  STS-TTPE 
070  S SYS- ID 
080  SSYS-VERS10N 
090  SSYS-TYPE 
100  HOD-ID 
110  HOD-VERS ION 
120  HOD-TYPE 
130  COHP-ID 
1*0  COHP-OM 
150  COHP-RATE 
160  COHP-OS 
170  TECH-ID 
180  COHPL-ID 
190  COMPLEXITY 
200  COHST-TTPE 
210  HDH-OCCDR 
220  PHASE 
230  HOH-RUH S-TOT 
235  TEST-PER 
2*0  NUM-RUNS-OX 
250  AHRS-PER-TEST 
260  TEST-ID 
270  TEST-TYPE 
280  DATE-ROD 
290  STRESS-TYPE 
300  STRESS-HEAS 
310  TEST-RESULT 
315  NUH-ERR 
320  SPR-NUH 
330  date-open 
3*0  HOD-SOORCE 
350  ERR-CAT-TYPE 
360  ERROR-CAT 
370  SEV-TTPE 
380  SEVERITY 
390  TYPE-TERH 
*00  HRS-TO-DISC 
*05  WORR-CAT 
*10  SHN-NUH 
*20  HOD-CHANCED 
*30  HOD-CH-VERS 
**0  COR-TYPE 
*50  COR-HECH 
*55  ACT-CAT 
*60  DATE-BECUN 
*70  DATE-CLOSE 
480  DAYS-OPED 
*90  BHRS-TO-FIX 
500  NUH-CHANGED 
510  CODE-CONT 
520  PROB-DESC 
530  CORR-D ESC 
5*0  ERROR-DESC 


PROJECT  IDENTIFICATION 
PROJECT  VERSION 
PROJECT  TTPE 
SYSTEH  IDENTIFICATION 
SYSTEH  VERSION 
SYSTEH  TYPE 

SUBSYSTEM  OR  FUNCTIONAL  AREA  IDENTIFICATION 

SUBSYSTEM  VERSION 

SUBSYSTEM  TYPE 

MODULE  IDENTIFICATION 

MODULE  VERSION 

MODULE  TYPE 

COMPUTER  IDENTIFICATION 

COMPUTER  OPERATINC  MODE 

COMPUTER  PROCESSING  RATE 

COMPUTER  OPERATINC  SYSTEM  TTPE 

IDENTIFICATION  OF  THE  CONSTRUCTION  TECHNOLOGY 

TYPE  OF  COMPLEXITY  MEASURE  USED 

THE  COMPLEXITY  MEASURE  VALUE 

CONSTITUENT  TYPE ( EX . JOVIAL, ASSEMBLY  LANCUACE) 
NUMBER  OF  OCCURRENCES  OF  CONSTITUENT  TTPE 
PBASE  IN  VHICH  ACTION  OCCURRED 
TOTAL  NUMBER  OF  RUNS 

THE  PERIOD  IN  WHICH  THE  TEST  WAS  PERFORMED 

TOTAL  NUMBER  OF  CORRECT  RUNS 

AVERACE  NUMBER  OF  HOURS  PER  TEST 

TEST  IDENTIFICATION 

TTPE  OP  TEST 

DATE  THE  TEST  WAS  RUN 

TYPE  OF  STRESS  APPLIED 

AMOUNT  OF  STRESS  APPLIED 

RESULT  OF  TEST 

NUMBER  OF  ERRORS  DISCOVERED  PER  TEST 
SOFTWARE  PROBLEM  REPORT  NUMBER 
DATE  THE  PROBLEM  WAS  REPORTED 

THE  MODULE  ID  WHERE  THE  PROBLEM  WAS  MANIFESTED 

ERROR  CATEGORY  TTPE 

ERROR  CATECORY  CODE 

SEVERITY  TTPE 

SEVERITY 

TTPE  OF  TERMINATION 
HOURS  TO  DISCOVERY 

THE  TYPE  OF  DEVELOPMENT  TASK  PERFORMED 

SOFTWARE  MODIFICATION  NOTICE  NUMBER 

THE  ID  OF  TBE  CHANCED  MODULE 

THE  VERSION  OF  THE  CHANGED  MODULE 

CORRECTION  TYPE 

CORRECTION  MECHANISM 

TBE  TYPE  OF  TEST  PERFORMED 

DATE  WHEN  PROBLEM  SOLUTION  WAS  INITIATED 

DATED  WHEN  PROBLEM  WAS  REPORTED  TO  BE  CLOSED 

NUMBER  OF  DAYS  BETWEEN  DATE  OPEN  AND  DATE  CLOSE 

HUNDRETHS  OF  HOURS  TO  FIX 

NUMBER  OF  SOURCE  STATEMENTS  CHANCED 

A CODE  THAT  INDICATES  AN  SPR  DOCUMENTS  MORE  THAN 

A DESCRIPTION  OF  THE  PROBLEM 

A DESCRIPTION  OF  THE  CORRECTION 

A DESCRIPTION  OF  THE  ERROR 


Figure  3-1.  Baseline  Data  Requirements  List 
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TABLE  3-1.  ATTRIBUTE  MATRIX 
AUGUST  1978 


ATTRIBCTE 

PROJ 

1 

PROJ 

2 

PRO  J 

3 

PRO  J 

4 

PROJ 

5 

PROJ 

6 

NAME 

NDM 

MAX 

NDM 

MAX 

NDM 

MAX 

NDM 

MAX 

NDM 

MAX 

NUM 

MAX 

CHR 

HUM 

CHR 

NDM 

CHR 

NDM 

CHR 

NDM 

CHR 

NDM 

CHR 

NDM 

PROJ-ID 

2 

1 

5 

1 

PROJ-VER5 ION 

6 

1 

3 

1 

PROJ-TTPI 

STS-ID 

1 

1 

1 

1 

STS-VERSION 

2 

1 

STS-TTPE 

1 

2 

SSYS-IT 

4 

1 

3 

1 

1 

1 

3 

1 

SSYS-VERSION 

3 

1 

7 

1 

SSYS-TYPE 

1 

1 

MOD-ID 

4 

1 

7 

1 

8 

1 

16 

1 

MOD-VERSION 

2 

1 

MOD-TYPE 

1 

1 

1 

1 

COMP-ID 

13 

1 

COMP-OM 

COMP-RATE 

7 

1 

COMP-OS 

13 

1 

TECH-ID 

1 1 

1 

1 

1 

12 

1 

COMPL-ID 

COMPLEXITY 

1 

1 

CONST-TYPE 

1 

2 

1 

1 

7 

1 

NDM-OCCDR 

5 

2 

5 

2 

6 

1 

PHASE 

1 

1 

1 

1 

1 

1 

2 

1 

NOM-RONS-TOT 

3 

1 

TEST-PER 

2 

1 

1 

1 

1 

1 

NOM-RDNS-OE 

3 

1 

AHRS-PER-TEST 

3 

1 

TEST-ID 

8 

1 

TEST-TYPE 

DATE-RDH 

5 

1 

STRESS-TYPE 

STRESS-MEAS 

6 

1 

TEST— RESDLT 

1 

1 

NDM-ERR 

1 

1 

SPR-HOM 

4 

l 

3 

1 

4 

1 

4 

l 

7 

1 

DATE-OPEN 

6 

1 

6 

1 

6 

1 

6 

1 

MOD-SODRCE 

7 

1 

ERR-CAT-TYPE 

ERROR-CAT 

5 

1 

5 

1 

5 

1 

4 

1 

5 

1 

2 

1 

SEV-TTPE 

SEVERITY 

1 

1 

1 

1 

1 

1 

TYPE-TERM 

1 

1 

1 

1 

HRS-TO-D ISC 

S 

1 

VORE-CAT 

1 

1 

SMN-NDM 

6 

1 

4 

1 

6 

1 

MOD-CHANCED 

7 

1 

4 

13 

7 

1 

8 

1 

MOD-CH-VERS 

2 

1 

COR-TTPE 

6 

1 

1 

1 

5 

1 

9 

1 

COR-MECH 

1 

1 

ACT-CAT 

1 

1 

DATE-BECDN 

DATE-CLOSE 

6 

1 

t 

1 

6 

1 

6 

1 

6 

1 

DATS-OPEN 

3 

1 

3 

1 

HHRS-TO-PIX 

3 

1 

NDM-CHANCED 

1 

1 

CODE-CONT 

1 

1 

1 

1 

PROB-DESC 

99 

3 

CORR-DESC 

99 

3 

ERROR -DESC 

50 

1 
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There  is  a total  of  2,036  Software  Problem  Report 
records  containing  the  SPR  number,  the  date  opened  and 
closed,  the  module(s)  that  were  changed,  the  error  category, 
the  phase  in  which  the  error  was  introduced,  the  CPU  hours 
to  discovery,  the  correction  type,  and  the  hundreths  of 
hours  of  CPU  time  to  fix.  Every  SPR  number  is  unique  and 
if  more  than  one  module  is  needed  to  be  changed  all  the 
module  names  are  contained  in  the  same  record. 

There  are  data  on  69  modules  whi.ch  contain  the  name 
of  the  module  and  a funtional  area  designation,  the  pro- 
gramming language(s)  used  and  the  number  of  source  state- 
ments. There  are  eight  records  that  contain  descriptive 
information  on  the  type  of  hardware  and  software  used  and 
descriptions  of  the  testing  phases. 

Project  3 - This  dataset  consists  of  Software  Problem  Reports 

and  Module  descriptions  from  a real-time  control  system  for 
a land-based  radar  system.  The  software  system  is  made  up 
of  109  modules  with  a total  of  86,780  Jovial/J3  source 
statements  and  49,000  Assemble  Language  statements.  The 
description  of  this  project  is  contained  in  Reference  9. 

There  is  a total  of  2,165  Software  Problem  Report 
records  containing  the  SPR  number,  the  date  opened  and 
closed,  the  module  that  was  changed,  the  error  category 
and  the  severity  of  the  error,  the  test  period,  the  phase 
in  which  the  error  was  introduced,  the  correction  type, 
and  the  Software  Modification  Notice  number.  There  is  one 
record  occurrence  for  each  modification  made  and  each  SMN 
number  is  unique . The  SPR  numbers  and  the  SMN  numbers  are 
the  same  except  that  there  are  some  blank  SPR  numbers. 

Froject  4 - This  dataset  contains  Software  Modification  Reports 
from  the  flight  software  of  an  onboard  guidance,  navigation 
and  control  system  for  both  a command  module  and  a lunar 
module.  There  were  16  flight  programs  (.releases)  and  the 
total  number  of  computer  words  for  all  releases  was 
approximately  610,000  computer  words.  The  sum  of  the 
number  of  words  added  or  changed  since  the  last  release 
was  83,866.  The  majority  of  the  software  was  coded  using 
assembly  language  with  interpretive  code  interspersed 
throughout.  A description  of  this  project  and  an  inter- 
pretation of  the  data  is  contained  in  Reference  9. 

There  is  a total  of  11,730  Software  Problem  Report 
records  containing  the  SPR  number,  the  date  closed,  the 
error  category,  the  phase  in  which  the  error  was  in- 
troduced, and  the  SMN  number.  There  is  a record  occurrence 
for  each  modification  made  and  each  SMN  number  is  unique. 

The  SPR  number  references  a document  that  established  the 
basis  for  the  change  but  is  only  available  for  about  137<> 
of  the  records. 
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Project  5 - This  dataset  consists  of  Software  Problem  Reports  and 
Module  descriptions  from  a large,  ground-based,  real-time 
data  processing  system.  The  majority  of  the  Software  was 
coded  using  CENTRAN  (an  intermediate  - level  language 
resembling  a subset  of  PL/1)  interspersed  with  assembly 
language  and  system  macros.  A description  of  this  project 
is  contained  in  reference  11. 


There  is  a total  of  5,693  Software  Problem  Rc  ort 
records  containing  the  SPR  number,  the  date  opened  and 
closed,  the  module  that  was  changed,  the  error  category, 
the  phase  in  which  the  error  was  introduced,  and  the 
correction  type.  There  is  a record  occurrence  for  each 
problem  encountered.  If  the  problem  required  more  than 
one  solution,  only  one  solution  was  recorded  which  was 
established  using  a priority  scheme. 

There  are  data  on  2,431  modules  which  contain  the  name 
of  the  module,  the  number  of  instructions,  the  language 
used,  and  the  type  of  construction. 

Project  6 - This  dataset  consists  of  run  and  failure  analysis 
data  from  the  development  of  the  Launch  Support  Data 
Data  Base  (LSDB)  which  includes  database  management 
functions  and  fairly  complex  scientific  calculations. 

There  is  a total  of  2,719  run  analysis  records  that 
report  484  errors.  The  records  contain  the  module  ID,  the  date 
and  time  run,  the  result  of  the  test,  the  test  period  and 
activity,  the  severity,  error  category,  and  number  of  errors. 
There  is  a record  occurrence  for  each  run  (test)  made. 

Below  is  a summary  of  the  size  of  the  datasets  within 
the  Historical  Database. 


Project  1 
Project  2 
Project  3 
Project  4 
Project  5 


Software 
Problem  Reports 


4,970 
2,036 
2 , lo5 
11,730 
5,693 


Module 

Characteristics 


69 

109 

2,413 


Run  Analysis 
Reports 


P 

, ' 

f 

f 


Project  b 


2,719 


3.1.2  Summary  Database.  The  Summary  Database  was  developed  so 
that  queries  could  be  formulated  across  the  projects.  The 
failure  and  correction  information  from  the  Historical  Database 
was  summarized  and  incorporated  into  the  Summary  Database.  The 
project/module  attribute,  environment,  and  productivity  data 
from  the  Final  Reports  (references  7-12)  were  extracted,  coded 
and  put  into  computer  readible  form. 

Figure  3-2  illustrates  the  three-dimensional  aspect  of  the 
Summary  Database. 

Software  environment,  technology,  resource  utilization,  pro- 
duction, and  software  characteristics  data  is  stored  for  various 
reporting  periods  for  the  life-cycle  phases.  In  addition,  four 
levels  of  descriptive  information  are  used  to  describe  the  software: 
the  project,  system,  functional  group,  and  module  levels.  A 
project  consists  of  one  or  more  systems  and  provides  a solution 
to  a problem.  A system  consists  of  one  or  more  functional  groups 
and  provides  a meaningful  product  to  the  user.  A system  is 
usually  capable  of  operating  independently  of  other  systems.  A 
functional  group  is  a collection  of  modules  which  together  satisfy 
a set  of  functional  and  performance  rpecif ications . A module  is 
a discrete  identifiable  set  of  instructions  handled  as  a unit  by 
an  assembler,  compiler,  or  loader.  Queries  can  be  formulated 
across  the  projects,  modules,  systems,  and  functional  groups. 

Data  summary  forma  were  developed  to  record  information  from 
the  technical  reports  for  the  six  datasets  in  the  Historical 
Database  and  to  provide  summarization  requirements  to  convert  the 
data  from  the  datasets  into  the  format  required  for  the  Summary 
Database.  Each  form  contains  eight  fields  that  provide  a basis 
for  defining  a unique  key  for  each  record  occurrence  within  the 
Summary  Database.  This  key  identifies  the  applicable  project, 
system,  functional  group,  and  module  that  applies  to  the  component 
information  recorded.  Also  included  in  this  key  is  information 
concerning  the  level  of  summarization  and  the  record  type  which 
indicates  the  format  of  the  data. 

In  addition  to  the  key  data,  the  following  information  is 
recorded  on  each  form. 

Component  (see  Figure  3-3).  Component  name,  type,  and 
description;  developer,  contract  number,  and  data  source;  the 
number  of  systems,  functional  groups  and  modules;  contract  type 
and  standards  applied;  the  purpose  of  the  data  collection  and  the 
procedures  used;  the  priorities  and  constraints  of  the  product 
development . 

Technology  (see  Figure  3-4).  The  phase,  reporting  level  and 
the  applicable  dates ; the  technology  utilized,  the  name  of  the 
tool  used,  and  the  percentage  of  usage. 
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Figure  3-3.  Component  Data  Summary  Form 
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Figure  3-4.  Technology  Data  Summary  Form 
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whether  the  stress  involves  CPU  time  and/or  a measure  of  the 
quality  of  the  tests.  The  constituent  and  complexity  measure 
types  should  be  defined  to  provide  a set  of  measurements  appli- 
cable across  projects  and  modules. 

The  majority  of  the  Data  Acquisition  Projects  provide  the 
date  of  error  detection  and  the  date  of  correction.  The  one 
exception  is  the  Project  4 data  where  only  the  date  of  correc- 
tion is  reported.  The  only  dataset  that  included  CPU  time  is 
the  Project  2 dataset  and  none  of  the  datasets  include  any  infor- 
mation on  each  test  performed.  However,  the  data  from  Project  6 
does  include  information  on  each  test. 

Hecht  (2)  differentiates  between  measuring,  estimating,  and 
predicting  software  reliability.  Measurement  implies  that  the 
software  operates  over  a period  of  time  and  segments  of  operation 
are  scored  as  failure  or  success.  A measurement  reliability  numeric 
is  normally  calculated  during  acceptance  testing  before  the  software 
is  turned  over  to  the  user  to  determine  if  a reliability  require- 
ment has  been  met.  This  reliability  numeric  can  also  be  used  to 
determine  if  the  software  is  deteriorating  over  the  life  of  the 
product  and  to  determine  the  effect  on  reliability  of  different 
development  and  testing  tools  and  techniques. 

Estimation  is  taking  sample  reliability  measurements  in 
order  to  approximate  when  testing  will  be  completed  and  to  determine 
if  a reliability  goal  can  be  met.  The  estimation  reliability 
numeric  must  take  into  account  any  differences  from  the  operational 
environment  including  test  data  selection  and  reliability  growth. 

Prediction  is  a reliability  statment  not  based  on  a measure- 
ment of  the  operation  of  the  software  but  on  the  actual  or  antici- 
pated attributes  of  the  software  such  as  the  number  of  lines  of 
code.  Prediction  is  used  for  project  management  purposes  to  esti- 
mate test  and  correction  effort  needed,  to  forecast  operational 
downtime,  and  to  guide  software  design  to  meet  reliability  require- 
ments . 

The  data  requirements  for  measuring  and  estimating  are  very 
similar,  but  the  data  needed  tor  prediction  varies  because  of 
the  difference  in  the  nature  or  the  assumptions.  For  measuring 
and  estimating,  it  is  assumed  that  the  system  is  operating, 
and  the  data  reflects  the  operational  characteristics  of  the 
system.  With  prediction,  only  the  static  characteristics  are 
considered  and  data  can  be  acquired  or  determined  before  the 
program  is  operational. 

The  hecht  reports  (2,3,4/  present  tne  essential  concepts  in 
the  numerical  evaluation  of  software  reliability  and  a simple 
mathematical  relations  (models)  that  have  been  found  useful  in 
the  field. 
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The  measurement  models  assume  that  the  tests  or  runs  (trials) 
performed  are  those  that  are  meaningful  for  the  actual  operational 
environment.  The  most  simplistic  measurement  model  provides 
a reliabilty  numeric  for  a batch  software  system  or  a real-time 
system  dealing  with  discrete  operations  using  the  ratio  of 
successful  trials  to  the  total  number  of  trials.  This  numeric  can 
be  normalized  to  program  length  to  account  for  differences  in 
exposure  to  failure  between  programs. 

For  real-time  systems  dealing  with  continuous  data  streams, 
a practical  reliability  numeric  is  mean- time-between- failures 
expressed  as  total  running  time  (t),  divided  by  the  number  of 
failures  (F)  in  the  interval  0 to  t . A normalizing  factor  for 
this  case  is  the  number  of  instructions  executed  per  unit-time. 

For  software  reliability  estimation,  if  the  software  is  being 
tested  in  the  operational  environment  and  the  test  cases  are 
representative  of  inputs  for  the  operational  environment,  then 
the  reliability  indices  calculated  during  measurement  can  be 
used  as  unbiased  estimators  to  estimate  future  reliability 
taking  into  account  reliability  growth  as  applied  to  operating 
time  and  error  removal. 

In  the  case  where  test  data  contains  more  severe  requirements 
then  actual  usage,  he  discusses  using  the  techniques  of  parti- 
tioning the  input  data  sets  and  calculating  the  probability  of 
failure  ascribed  to  the  selection  of  input  data. 

Littlewood  (5,6)  discusses  a model  for  estimating  the  re- 
liability of  a software  system  based  upon  the  attributes  of  each 
component  (or  subprogram)  and  the  interactions.  The  inputs  needed 
for  this  model  include  a transition  probability  matrix  which  gives 
the  probability  of  each  subprogram  given  that  it  will  switch  to 
another  program,  and  the  failure  rate  for  each  subprogram. 

The  model  is  also  extended  to  include  cost  of  failure  by  in- 
puting  the  mean  and  variance  failure  costs  for  each  subprogram. 

He  states  that  the  failure  rates  and  cost  parameters  for  each 
subprogram  can  be  estimated  from  test  data,  that  the  transition 
probability  matrix  in  a large  system  would  be  sparse,  and  that 
the  mean-time  spent  in  each  subprogram  should  be  able  to  be 
estimated. 

Littlewood  (6)  discusses  the  need  to  examine  the  special 
requirements  of  software  and  that  many  of  the  software  reliability 
measures  rely  too  much  on  hardware  analogies.  He  specifically 
argues  that  we  should  be  concerned  with  operational  reliability 
and  not  with  how  many  faults  are  in  the  program.  He  defines 
operational  reliability  as  the  reliability  of  the  program  as  it 
performs  (failure  rate,  distribution  of  time  to  next  failure,  etc). 


The  results  of  a project  to  develop  software  reliability 
prediction  models  using  regression  analysis  methods  are  presented 
by  Motley  and  Brooks  (7) . The  authors  concluded  that  the  predict- 
ability of  programming  error  measurements  varies  from  very  low 
to  very  high  and  the  variability  is  related  to  the  functional 
differences  of  the  modules,  the  differences  in  the  programming 
language  used,  the  length  of  time  formal  failure  data  collection 
was  carried  out,  the  amount  of  thoroughness  of  testing,  in- 
adequacy of  the  linear  model  to  provide  perfect  predictability, 
and  other  programmer,  proiect,  and  management  factors  affecting 
the  software  development  process.  They  recommend  the  establish- 
ment of  a baseline  set  of  predictor  variables  initially  starting 
with  their  five  and  ten  predictor  summaries.  These  five  and 
ten  predictor  variables  are  dependent  upon  the  project  or  func- 
tional grouping.  This  baseline  list  should  then  be  expanded  to 
reflect  the  results  of  further  studies. 

Their  results  indicated  that  the  length  of  the  program  and 
the  number  of  program  interfaces  per  100  lines  of  source  code 
were  found  to  be  the  best  single  predictors  and  that  program 
complexity  variables  contributed  significantly  to  predict- 
ability . 

Musa  (8,9)  postulates  a software  reliability  model  based 
on  execution  or  CPU  time,  and  a concomitant  model  of  the  testing 
and  debugging  process  that  permits  execution  time  to  be  related 
to  calendar  time.  The  main  input  consists  of  a set  of  ex- 
ecution time  intervals  between  failures  experienced  in  testing, 
along  with  the  number  of  days  from  the  start  of  testing  on 
which  the  failures  occurred.  Auxilliary  inputs  consist  of  23 
parameters  including  dates,  computer  time,  and  man-hours  required 
per  correction,  personnel  and  computer  availability,  and  mean- 
time-to-failure  (MTTF)  objective. 

The  output  consists  of  measurement  numeric  of  the  present 
MTTF,  and  estimate  of  MTTF  objective  attained,  remaining  number 
of  faults  to  be  uncoverd  and  corrected  to  achieve  the  MTTF 
objective,  and  an  estimate  of  the  remaining  execution  time  and 
calendar  time  required  to  meet  the  objective. 

Thayer  (10)  and  Thayer  with  Lipow  (11)  discuss  what  they 
have  termed  a phenomenological  approach  to  software  reliability 
prediction.  Phenomenological  is  used  in  the  sense  of  relating 
to  measureable  software  characteristics  that  experience  has  shown 
are  well  correlated  with  reliability.  They  have  used  both 
standard  and  nonstandard  linear  regression  analysis  techniques 
applied  to  numbers  of  software  problems  as  a linear  function 
of  defined  software  reliability  characteristics. 
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They  have  hypothesized  that  the  best  single  predictor  for 
the  number  of  problems  is  the  number  of  branches  and  is  a 
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slightly  superior  predictor  than  the  number  of  statements.  They 
also  state  that  the  number  of  application  program  interfaces, 
number  of  computational  statements,  and  number  of  data-handling 
statements  are  also  good  predictors  but  that  the  number  of  branches 
and  the  number  of  data-handling  statements  are  highly  correleated 
and  should  not  be  used  together.  These  predictors  cover  the 
period  for  the  number  of  software  problems  during  formal  demon- 
stration . 

Their  data  also  showed  that,  for  each  software  function,  the 
number  of  preoperational  problems  is  a fairly  good  predictor  of  the 
number  of  operational  failures  and  that  the  number  of  design 
problem  reports  is  a good  predictor  of  the  number  of  problems 
encountered  in  testing.  Additional  analysis  shows  that  opera- 
tional failures  for  each  software  function  are  reasonably  well 
correlated  with  the  number  of  design  problem  reports  for  that 
function. 

An  examination  of  some  of  the  more  widely  used  software  re- 
liability models  is  presented  in  reference  12.  This  paper 
addresses  analytic  models  that  predict  the  number  of  indigenous 
errors  remaining  in  the  program,  the  mean-time  to  the  next 
failure,  the  time  required  to  discover  all  remaining  errors,  and 
the  standard  deviation  associated  with  the  predictions.  Although 
the  authors  of  this  paper  state  that  all  error  prediction 
models  are  deficient  in  the  accuracy  of  the  model  predictions, 
the  insights  gained  from  studying  the  problem  have  provided 
guidelines  for  developing  and  testing  the  software. 

Sukert  (13,14)  reports  on  a study  to  analyze  the  results  of 
several  software  reliability  models  against  failure  data  obtained 
during  formal  testing  of  several  large  DoD  and  NASA  software 
development  projects.  No  consistent  patterns  emerged  in  this 
study.  Results  varied  depending  upon  data  content  and  applica- 
tion type.  He  recommended  that  more  detailed  analysis  is 
needed  with  additional  datasets  and  that  better  ways  are  needed 
to  statistically  determine  the  accuracy  of  model  predictions. 

Goel  and  Okumoto  (15)  have  developed  a stochastic  model  for 
software  failure  phenomena  based  on  the  case  where  errors  are 
not  corrected  with  certainty.  The  following  quantities  of  in- 
terest are  derived  in  this  report:  distribution  of  time  to  a com- 
pletely debugged  system,  distribution  of  time  to  a specified 
number  of  remaining  errors,  distribution  of  number  of  remaining 
errors,  expected  number  of  errors  detected  by  time  (t) , and  the 
distribution  of  time  between  software  failure. 

The  required  data  for  models  described  in  (reference  12) 
include  the  date  the  error  was  detected.  From  this  information 
the  calendar  time  between  failures,  the  number  of  failures  per 
reporting  period  and  cumulative  number  of  errors  detected  at  a 
certain  date  can  be  computed. 
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The  required  data  for  models  described  in  reference  2 and  2 
include  the  date  the  failure  occured  and  the  date  corrected. 

In  addition  to  the  above  mentioned  data,  cumulative  number  of 
corrections  made  and  the  number  of  errors  detected  and  corrected 
per  time-interval  can  be  determined. 

The  data  required  for  models  described  in  reference  2, 
and  5-9  include  the  amount  of  execution  (CPU)  time  expended 
before  an  error  was  detected.  From  this  information  CPU  time- 
to-failure,  total  running  time-to-date , CPU  time-between- 
failures,  and  CPU  time-per-interval  can  be  computed.  This  CPU 
time  can  be  reported  either  for  cumulative  time  before  an  error 
occurs  or  for  each  trial,  whether  it  be  a success  or  a failure. 
Reference  2 includes  discussions  on  models  that  required  a 
recording  on  the  date  of  each  test  (trial)  and  the  result. 

The  prediction  models  discussed  in  references  7,  10  and  11 
also  require  module  length,  number  of  interfaces,  number  of 
branches,  number  of  computational  statements,  number  of  data 
handling  statements,  and  other  statement  type  counts  (these 
attributes  have  been  termed  constituent  types) . Also  required 
are  complexity  measures  and  the  number  of  design  problems  en- 
countered . 

Additional  information  that  provide  more  meaning  to  the 
results  include  dates  for  testing  phases,  error  descriptions 
including  type  and  severity,  operating  mode  and  processing 
rate  of  the  computer,  stress  type  and  measure  (i.e.,  a measure 
to  indicate  how  well  the  test(s)  correlate  to  the  operational 
environment  and/or  a measure  of  the  amount  or  percentage  of 
code  exercised),  module  attributes,  resources  spent  in  correct- 
ing the  error,  and  the  phase  in  which  the  error  was  introduced. 
The  model  discussed  in  references  8 and  9 requires  additional 
information  including  personnel  and  computer  availability  and  the 
project  mean-time-to-failure  objective. 

The  semi-markov  model  discussed  in  reference  5 is  actually 
a system  level  reliability  estimation  model  in  that  an  estimate 
is  made  dependent  upon  the  failure  rate  for  each  subprogram, 
a probability  matrix  for  subprogram  "switching",  and  a measure 
of  how  much  time  would  be  spent  in  each  subprogram. 
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MISSION 

of 

Rome  Air  Development  Center 

RAPC  plans  and  execute*  Kuzmich,  development,  tut  and 
& eluted  acquisition  programs  in  support  oi  Cotmand,  Control 
Communications  and  Intelligence  lC3J)  activitiu.  Technical 
and  yyw^"3  4uppo*t  uithin  areas  oi  technical  competence 
u provided  to  ESV  Program  OMicu  IPOs  l and  other  ESO  < 

elements.  The  principal  technical  mission  areas  one  J 

communications,  electromagnetic  guidance  and  control,  sur-  $ 
violence  oi  ground  and  aerospace  objects,  intelligence  data  < 
collection  and  handling,  iniormation  system  technology,  C 

ionospheric  propagation,  solid  state  sciencu,  microwave  il 

physics  and  electronic  reliability,  maintainability  and  ? 

compatibility.  i 


